Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Progressive ratio mask-based adaptive noise estimation method
Jianqing GAO, Yanhui TU, Feng MA, Zhonghua FU
Journal of Computer Applications    2023, 43 (4): 1303-1308.   DOI: 10.11772/j.issn.1001-9081.2022030384
Abstract322)   HTML2)    PDF (1425KB)(61)       Save

Deep learning based speech enhancement algorithms typically perform better than the traditional noise suppression based speech enhancement algorithms. However, deep learning based speech enhancement algorithms usually do not work well when there exists mismatch between training data and test data. Aiming at the above problem, a novel Progressive Ratio Mask (PRM)-based Adaptive Noise Estimation (PRM-ANE) method was proposed, and this method was used for the preprocessing of the speech recognition system. In the method, Improved Minima Controlled Recursive Averaging (IMCRA) algorithm with frame-level noise tracking capability and utterance-level deep progressive learning algorithm nonlinear interactions between speech and noise were used comprehensively. Firstly, two Dimensional-Convolutional Neural Network (2D-CNN) was adopted to learn PRM, which increased with the increase of Signal-to-Noise Ratio (SNR). Then, the PRMs at sentence level were combined by the conventional frame-level speech enhancement algorithm to perform speech enhancement. Finally, the enhanced speech based on the multi-level information fusion was directly fed into speech recognition system to improve the performance of the system. Experimental results on the CHiME-4 real test set show that the proposed method can achieve a relative Word Error Rate (WER) of 7.42%, which is 51.41% lower than that of IMCRA speech enhancement method. Experimental results show that the proposed enhancement method can effectively improve the performance of downstream recognition tasks.

Table and Figures | Reference | Related Articles | Metrics
End-to-end speech recognition method based on prosodic features
Cong LIU, Genshun WAN, Jianqing GAO, Zhonghua FU
Journal of Computer Applications    2023, 43 (2): 380-384.   DOI: 10.11772/j.issn.1001-9081.2022010009
Abstract327)   HTML13)    PDF (1114KB)(135)       Save

In the traditional speech recognition system, the optimal decoding paths are determined by a language model restrained by the training data. Almost inevitably, the right pronunciation may produce wrong character recognition results in some scenarios. In order to use the prosodic information in speech to enhance the probability of correct character combination in language model, an end-to-end speech recognition method based on prosodic features was proposed. Based on the attention mechanism based encoder-decoder speech recognition framework, firstly, the coefficient distribution of attention mechanism was used to extract prosodic features such as pronunciation interval and pronunciation energy. Then, the prosodic features were combined with decoder to significantly improve the accuracy of speech recognition in the cases with the same or similar pronunciation and semantic ambiguity. Experimental results show that the proposed method achieves a relative accuracy improvement of 5.2% and 5.0% respectively compared with the baseline end-to-end speech recognition method on 1 000 h and 10 000 h speech recognition tasks and improves the intelligibility of speech recognition results.

Table and Figures | Reference | Related Articles | Metrics